<!DOCTYPE group PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<group>

<p><b>VLFeat</b> offers a hierarchical version of integer k-means, which
recursively applies <code>vl_ikmeans</code> to compute finer and finer
partitions. For more details see 
<a href="%pathto:root;api/hikmeans_8h.html">Hierarchical Integer
  k-means API reference</a> and the <a href="%pathto:tut.ikm;">Integer
  k-means tutorial</a>.
</p> 

<ul>
  <li><a href="%pathto:tut.hikm.usage;">Usage</a></li>
  <li><a href="%pathto:tut.hikm.tree;">Tree structure</a></li>
  <li><a href="%pathto:tut.hikm.elkan;">Elkan</a></li>
</ul>

<!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
<h1 id="tut.hikm.usage">Usage</h1>
<!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->

<p>First, we generate some random data to cluster in <code>[0,255]^2</code>:</p>

<pre>
data     = uint8(rand(2,10000) * 255) ;
datat    = uint8(rand(2,100000)* 255) ;
</pre>

<p>To cluster this data, we simply use <code>vl_hikmeans</code>:</p>

<pre>
K        = 3 ;
nleaves  = 100 ;
[tree,A] = vl_hikmeans(data,K,nleaves) ;
</pre>

<p>Here <code>nleaves</code> is the desired number of leaf
clusters. The algorithm terminates when there are at least
<code>nleaves</code> nodes, creating a tree with <code>depth =
  floor(log(K nleaves))</code></p>

<p>To assign labels to the new data, we use <code>vl_hikmeanspush</code>:</p>

<pre>
AT       = vl_hikmeanspush(tree,datat) ;
</pre>

<div class="figure">
<image src="%pathto:root;demo/hikmeans-tree.jpg"/>
<image src="%pathto:root;demo/hikmeans-clusters.jpg"/>
<div class="caption">
<span class="content">
<b>Hierarchical integer K-means.</b>  Left: A depiction of the
recursive clusters. Each node is a cluster center. The root note is
not depicted (its center would be the mean of the dataset).  Right:
Clusters are represented as different colors (here are more than 100
clusters, but only three colors are used).
</span>
</div>
</div>

<!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
<h1 id="tut.hikm.tree">Tree structure</h1>
<!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->

<p>The output <code>tree</code> is a MATLAB structure representing the tree of
clusters:</p>

<pre>
> tree
tree =
 
          K: 3
      depth: 5
    centers: [2x3 int32]
        sub: [1x3 struct]
</pre>

<p>The field <code>centers</code> is the matrix of the cluster centers at the
root node.  If the depth of the tree is larger than 1, then the field
<code>sub</code> is a structure array with one entry for each cluster. Each
element is in turn a tree:</p>

<pre>
> tree.sub
ans = 

1x3 struct array with fields:
    centers
    sub
</pre>

<p>with a field <code>centers</code> for its clusters and a field
<code>sub</code> for its children. When there are no children, this
field is equal to the empty matrix</p>

<pre>
> tree.sub(1).sub(1).sub(1).sub(1)

ans = 

    centers: [2x3 int32]
        sub: []
</pre>

<!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
<h1 id="tut.hikm.elkan">Elkan</h1>
<!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->

<p>VLFeat supports two different implementations of k-means. While they
produce identical output, the Elkan method is sometimes faster.
The <code>method</code> parameters controls which method is used. Consider the case when <code>K=10</code> and our data is now 128 dimensional (e.g. SIFT descriptors):</p>

<pre>
K=10;
nleaves = 1000;
data = uint8(rand(128,10000) * 255);
tic;
[tree,A] = vl_hikmeans(data,K,nleaves,'method', 'lloyd') ; % default
t_lloyd = toc
tic;
[tree,A] = vl_hikmeans(data,K,nleaves,'method', 'elkan') ;
t_elkan = toc

t_lloyd =

    8.0743

t_elkan =

    3.0427
</pre>

</group>
